Skip to content

mirrors stats script#169

Merged
abizer merged 6 commits into
ocf:masterfrom
abizer:mirrors-script
Aug 2, 2017
Merged

mirrors stats script#169
abizer merged 6 commits into
ocf:masterfrom
abizer:mirrors-script

Conversation

@abizer
Copy link
Copy Markdown
Member

@abizer abizer commented Jun 17, 2017

No description provided.

Comment thread modules/ocf_mirrors/manifests/init.pp Outdated

'/usr/local/sbin/record-mirrors-stats':
content => template('ocf_mirrors/record-mirrors-stats.py.erb'),
mode => '0644';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I don't think this should be world-readable to be consistent with our use of secrets elsewhere.
  2. Shouldn't this be executable?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, one sec will fix

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably store config separately as @chriskuehl says and as we do for all of our other programs.

Comment thread modules/ocf_mirrors/manifests/init.pp Outdated

'/usr/local/sbin/record-mirrors-stats':
content => template('ocf_mirrors/record-mirrors-stats.py.erb'),
mode => '0644';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's in sbin it should be executable. additionally, if it has secrets, it shouldn't be world-readable and should have show_diff => false

ideally don't template a python file though. store the config separately.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, my mistake

I modeled this after how it's done in the labstats script, I guess I'll fix this and then fix that as well

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it now uses environment variables, seeded by cron. The password is stored in the root crontab so it should be secure enough + no need to add another config file. Does that seem reasonable?

@@ -0,0 +1,54 @@
#!/usr/bin/python
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python3?


import os
import pymysql
from datetime import date
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group imports into stdlib / 3rd party / application per PEP8: https://www.python.org/dev/peps/pep-0008/#imports

pre-commit should enforce this for you?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-commit didn't say anything about this, I'll re-install it I guess

sources = [mirrored for mirrored in os.listdir(MIRRORS_PATH)
if os.path.isdir(os.path.join(MIRRORS_PATH, mirrored)) and not mirrored.startswith('.')]

sources.append('other') # catchall
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 spaces before inline comments per PEP8: https://www.python.org/dev/peps/pep-0008/#inline-comments

pre-commit isn't enforcing this?

def build_dists():
return { mirrored: { 'up': 0, 'down': 0 } for mirrored in sources }

def process_file(fn):
Copy link
Copy Markdown
Member

@chriskuehl chriskuehl Jun 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 blank lines before/after functions per PEP8 (going to stop commenting these, pre-commit should do this for you)

cursorclass=pymysql.cursors.DictCursor,
)

c = conn.cursor()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a context manager

with conn as cursor:
   ...

for line in f:
stats = line.split()
dist = stats[6]
dist = dist.split('/')[1] if '/' in dist else dist
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you include a comment with an example line so people reading the script can see what is being parsed?

Comment thread modules/ocf_mirrors/manifests/init.pp Outdated

cron {
'mirrors-stats':
command => '/usr/local/sbin/record-mirrors-stats',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't know how this works if it's not executable?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to work fine for the last two weeks I let it run, but yeah I need to make it +x

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe cron runs the crunchbang without checking if it's +x? Sounds fishy to me.

dist = stats[6]
dist = dist.split('/')[1] if '/' in dist else dist

if stats[8][0] in ('2', '3') and dist in dists: # http status code is 2xx/3xx
Copy link
Copy Markdown
Member

@matthew-mcallister matthew-mcallister Jun 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment this as well as the above lines, please.

EDIT: I'm referring to the stats[8][0] part of course.

@@ -0,0 +1,63 @@
#!/usr/bin/python
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python files should either drop the .py if they're not intended to be imported (and only executed), or use underscores so they can be imported

our tradition is typically to drop the extension

OCFSTATS_PWD = os.environ['OCFSTATS_PWD']
MIRRORS_PATH = '/opt/mirrors/ftp'
LOG_PATH = '/var/log/apache2'
LOG_NAME = 'mirrors.ocf.berkeley.edu_access.log'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I forgot to mention this before, but the reason why I was using mirrors.ocf.berkeley.edu_access.log.1 instead of this was because the logs get rotated at around 6 AM, so if you check them once a day, unless you check very close to when they are rotated, you lose some data. For example, this is currently checked at midnight, so all the data from midnight to around 6 AM is currently lost since it is rotated before the next midnight.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option I just thought of is to scan both log files and check if the date is within the current day, that might be the best if you truly want the numbers to be representative of the actual day boundaries instead of 6 AM - 6 AM the next day. (honestly I don't think it matters, but it wouldn't be too bad to scan both log files either)

@abizer
Copy link
Copy Markdown
Member Author

abizer commented Jun 28, 2017 via email


# extract dist name from request url
# '/debian/pool/main/h/hwdata/...' -> 'debian'
dist = stats[6]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this script as a bit of a hack to quickly see what traffic levels we had across dists, but ideally there would be a comment here explaining why these indices were chosen with maybe an example line:

The format comes from here:

%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %I %O

which translates to something like

8.8.8.8 - - [26/Jun/2017:00:05:13 -0700] "GET /centos/7.3.1611/path/here/blah HTTP/1.1" 200 3735 "-" "urlgrabber/3.10 yum/3.4.3" 147 3971

so that's why the 6th index is needed (%r corresponds to the path because there is an extra space in the timestamp), and the 8th index below gives the status code so that can be filtered for only successes. -1 (%O) and -2 (%I) below give the bytes downloaded and uploaded as seen by Apache respectively.

@jvperrin
Copy link
Copy Markdown
Member

Meh, I think that's a bit much and would be surprising behavior honestly that one system has a different cron.daily time than the others for a script to collect mirror metrics. I think just scraping both log files and discarding those in the wrong date would be reasonably easy. The script only runs once a day, so if it takes a few more seconds it's not a big deal at all.

@chriskuehl
Copy link
Copy Markdown
Member

+1 for jvperrin's approach

@abizer abizer merged commit b1a9353 into ocf:master Aug 2, 2017
matthew-mcallister pushed a commit to matthew-mcallister/puppet that referenced this pull request Oct 29, 2017
add mirrors stats script and cronjob
thank based @jvperrin
ethanwu10 added a commit that referenced this pull request Jan 17, 2023
Some history: the variable was introduced in #169 when stat tracking was
added, but #925 changed the implementation and didn't use the variable
any more (probably because it was so far away from where it was used).
This commit moves it next to where it is used now.
ethanwu10 added a commit that referenced this pull request Jan 17, 2023
* fix(mirrors): reload nginx when certs change

Previously, nginx was not picking up renewed certs.

* refactor(mirrors): move+use variable for ocfstats password

Some history: the variable was introduced in #169 when stat tracking was
added, but #925 changed the implementation and didn't use the variable
any more (probably because it was so far away from where it was used).
This commit moves it next to where it is used now.
singingtelegram pushed a commit that referenced this pull request Apr 30, 2023
* fix(mirrors): reload nginx when certs change

Previously, nginx was not picking up renewed certs.

* refactor(mirrors): move+use variable for ocfstats password

Some history: the variable was introduced in #169 when stat tracking was
added, but #925 changed the implementation and didn't use the variable
any more (probably because it was so far away from where it was used).
This commit moves it next to where it is used now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants